A.2 (cyclic,*) Distribution a Mpi-like Code for Gaussian Elimination A.1 (block,*) Distribution 6 Concluding Remarks Acknowledgement

نویسندگان

  • J. Choi
  • J. J. Dongarra
  • R. Pozo
چکیده

(j,k) = B(j,k)-B(j,lindex(2)) * B(lindex(1),k) /B(lindex(1),lindex(2)) END DO END DO DO i=n, 1,-1 gindex(1)=i gindex(2)=i CALL GLOBAL-TO-LOCAL(B,gindex,lindex,local) IF (local .EQ. true) THEN x(xindx(i)) = B(lindex(1),UBOUND(B,2)) / B(lindex(1),lindex(2)) proc=i mod p MPI-BCAST(x(indx(i)),tag 3 ,all,proc) DO j = lindex(1)-1, LBOUND(B,1),-1 B(j,UBOUND(B,2)) = B(j,UBOUND(B,2))-B(j,lindex(2)) * x(xindx(i)) END DO ENDIF END DO ! Linear system solver library in MPI-code with (cyclic,*) decomposition SUBROUTINE LINSYS-CYST(B,x) !HPF$ LOCAL REAL INTENT (IN) :: B(:,:) REAL INTENT (OUT) :: x(:) INTEGER xindx(UBOUND(B,2)-1) REAL temp(UBOUND(B,2)-1) INTEGER gindex(2), lindex(2) DO i = 1, n xindx(i) = i END DO DO i = 1, n gindex(1)=i gindex(2)=i CALL GLOBAL-TO-LOCAL(B,gindex,lindex,local) IF (local .EQ. true) THEN maxloc = MAXLOC(B(lindex(1),lindex(2):UBOUND(B,2)-1)) ENDIF proc=m mod p MPI-BCAST(maxloc,tag 1 ,all,proc) maxval = B(lindex(1),maxloc) temp = B(:,maxloc) B(:,maxloc) = B(:,lindex(2)) B(:,lindex(2)) = temp tempx = xindx(maxloc) xindx(maxloc) = xindx(i) xindx(i) = tempx This section shows the MPI-like code of two distributions: (block,*) and (cyclic,*) based on Gaussian elimination and backward substitution. Both implementations are coded in the form of F90/HPF extrinsic. ! Linear system solver library in MPI-code with (block,*) decomposition SUBROUTINE LINSYS-BLST(B,x) !HPF$ LOCAL REAL INTENT (IN) :: B(:,:) REAL INTENT (OUT) :: x(:) INTEGER xindx(UBOUND(B,2)-1) REAL temp(UBOUND(B,2)-1) INTEGER gindex(2), lindex(2) DO i = 1, n xindx(i) = i END DO DO i = 1, n gindex(1)=i gindex(2)=i CALL GLOBAL-TO-LOCAL(B,gindex,lindex,local) IF (local .EQ. true) THEN maxloc = MAXLOC(B(lindex(1),lindex(2):UBOUND(B,2)-1)) ENDIF proc=b i d n p e c MPI-BCAST(maxloc,tag 1 ,all,proc) maxval = B(lindex(1),maxloc) temp = B(:,maxloc) B(:,maxloc) = B(:,lindex(2)) B(:,lindex(2)) = temp tempx = xindx(maxloc) xindx(maxloc) = xindx(i) xindx(i) = tempx MPI-BCAST(B(lindex(1),lindex(2):UBOUND(B,2)), tag 2 ,all,proc) DO j=lindex(1)+1, UBOUND(B,1) DO k=lindex(2)+1, UBOUND(B,2) B(j,k) = B(j,k)-B(j,lindex(2)) * B(lindex(1),k) /B(lindex(1),lindex(2)) END DO END DO DO i=n, 1,-1 gindex(1)=i gindex(2)=i CALL GLOBAL-TO-LOCAL(B,gindex,lindex,local) IF (local .EQ. true) THEN x(xindx(i)) = B(lindex(1),UBOUND(B,2)) / B(lindex(1),lindex(2)) 27 5] \EEcient implementation of barrier synchronization in wormhole-routed hypercube multicomputers," Providing a rich set of libraries to programmers in MPCs is a highly demanded, but not a trivial, task. This paper discussed four major features: scalability, portability, recompilation, and exibil-ity, required in the design of scalable libraries. We advocate a layered structure of libraries to meet diierent demands and requirements from diierent perspectives. In general, the higher two layers provide more exibility and the lower two layers provide better performance. However, as compiler technologies mature, the higher layers may also provide better performance. Both the number of processors and the problem size have a great impact on the library …

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An analysis of data distribution methods for Gaussian elimination in distributed-memory multicomputers

In multicomputers, an appropriate data distribution is crucial for reducing communication overhead and therefore the overall performance. For this reason, data parallel languages provide programmers with primitives, such as BLOCK and CYCLIC that can be used to distribute data across the distributed memory. However, the languages do not aid the programmer as to how the distribution should be per...

متن کامل

Computing a block incomplete LU preconditioner as the by-product of block left-looking A-biconjugation process

In this paper, we present a block version of incomplete LU preconditioner which is computed as the by-product of block A-biconjugation process. The pivot entries of this block preconditioner are one by one or two by two blocks. The L and U factors of this block preconditioner are computed separately. The block pivot selection of this preconditioner is inherited from one of the block versions of...

متن کامل

Toward Automatic Distribution

This paper considers the problem of distributing data and code among the processors of a distributed memory supercomputer. Provided that the source program is amenable to detailed dataaow analysis, one may determine a placement function by an algorithm analogous to Gaussian elimination. Such a function completely characterizes the distribution by giving the identity of the virtual processor on ...

متن کامل

Analysis of the Asymptotic Performance of Turbo Codes

Battail [1989] shows that an appropriate criterion for the design of long block codes is the closeness of the normalized weight distribution to Gaussian. A subsequent work by Biglieri and Volski [1994] shows that iterated product of single parity check codes satisfies this criterion. Motivated by these works, in the current article, we study the performance of turbo codes for large block length...

متن کامل

Data and Workload Distribution in a Multithreaded Architecture

Matching data distribution to workload distribution is important to improve the performance of distributedmemory multiprocessors. While data and workload distribution can be tailored to fit a particular problem to a particular distributed-memory architecture, it is often difficult to do so for various reasons including complexity of address computation, runtime data movement, and irregular reso...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1993